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SCHEDULING  PARALLEL  PROCESSES  WITHOUT  A COMMON  SCHEDULER 


EXTENDED  ABSTRACT 


George  Holober  and  Lawrence  Snyder 
Department  of  Computer  Science 
Yale  University 
New  Haven,  Connecticut  06520 


Abst  ract  : An  algorithm  which  solves  the  critical 
section  problem  for  distributed  processes  is 
presented.  W'e  extend  the  solution  of  Lamport 
(LL7bJ  by  continuing  to  allow  processes  to  access 
their  respective  critical  sections  in  any 
arbitrars  user-specified  order,  but  witf  greatly 
reduced  storage  requirements  for  each  process.  In 
addition,  we  supply  a facility  for  testing  the 
presence  of  deadlock  among  processes  waiting  to 
enter  their  critical  code.  We  show  our  scheme  to 
be  tolerant  of  several  malfunctioning  processors, 
and  derive  ar.  equ.it  ion  relating  the  probability  of 
total  system  failure  to  the  probability  of  many 
individual  failures  occurring  simultaneously  among 
the  processors. 


INTRODUCTION 


The  "critical  section"  problem,  which 
involves  developing  a svnchroni zat ion  scheme  for  a 
set  of  processes  that  enforces  solo  occupancy  of 
common  code,  is  further  complicated  when  we 
generalize  the  circumstances  under  which  the 
scheme  will  work  or  restrict  the  allowable 
solutions  in  some  manner.  For  example,  we  wi 1 1 
assume  that  the  processes  execute  asynchronously 
( i .e . nothing  is  known  about  one  process'  rate  of 
execution  relative  to  that  of  another  process  nor 
to  the  same  process'  rate  of  execution  at  a 
different  time)  and  that  each  process  must  have 
the  same  solution  as  every  other  process.  Another 
reasonable  objective  is  to  avoid  possible  deadlock 
resulting  from  tw  or  more  processes  waiting  for 
each  other. 

A number  of  solutions  to  the  critical  section 
problem  have  been  developed  and  studied  since 
Dijkstra's  initial  paper  [EWD65 ] • The  results 
reported  in  that  paper,  along  with  the  subsequent 
refinements  outlined  by  Knuth  (DEKl,  deBruljn 
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(deB),  and  Eisenberg  and  McGuire  lEM],  assumed 
that  concurrent  processes  would  be  implemented  on 
mul tl prog  rammed  systems.  These  systems  allow 
different  processes  to  read  from  or  write  into  any 
memory  location. 

Only  recently  have  researchers  begun  to  look 
at  multiprocessor  or  distributed  systems.  In  such 
a system,  a process  may  read  or  write  in  its  local 
memory  and  may  read  from  another  processor's 
memory,  but  may  not  write  into  another  processor's 
address  space.  This  restriction  prevents  the  use 
of  global  variables,  but  does  yield  one  important 
advantage  over  mul  t i prog rammed  computers:  if  one 
process  fails,  the  entire  systems  does  not 
necessarily  crash,  though  system  performance  wi 1 1 
likely  be  degraded. 

One  of  the  first  examinations  of  distributed 
systems  was  done  by  Dijkstra  IEWD74J.  This  paper 
studied  the  possibility  of  processors 
independently  recognizing  that  they  had  failed  and 
correcting  themselves  to  some  prescribed  state. 
At  about  the  same  time  Lamport  (LL74  ) presented  a 
solution  to  Dijkstra's  original  problem  with 
critical  sections  that  obeyed  the  constraints  of 
distributed  computers.  Rivest  and  Pratt  [RP) 
improved  upon  this  scheme  by  bounding  the  values 
of  the  variables  necessary  for  inter-process 
coordination  and  by  preventing  a process  that 
continually  falls  and  restarts  from  deadlocking 
the  system.  Further  improvements  (in  terms  of 
smaller  ranges  of  values  for  variables,  greater 
fairness  when  sequencing  processes  for  entry  into 
their  critical  regions,  and  reduced  waiting  times 
•for  processes  before  entering  their  respective 
regions)  were  developed  by  Peterson  and  Fischer 
(M).  Finally,  Katseff  iHPKj  incorporated  the 
best  aspects  of  each  of  these  solutions,  including 
the  servicing  of  processes  in  the  order  in  which 
they  arrive  (FIFO),  into  one  algorithm. 

Taking  a somewhat  different  approach,  Lamport 
(LL76)  recognized  the  fact  that  It  Is  not  always 
desirable  to  allow  processes  to  enter  their 
critical  regions  in  the  same  order  in  which  they 
attempted  to  access  these  regions.  It  Is 
frequently  the  case  that  a process  may  not 


conflict  with  another  process  in  the  sense  that 
they  nav  enter  critical  regions  simultaneously, 
though  both  these  processes  may  conflict  with  a 
third  process.  Furthermore,  given  a set  of 
•processes  that  are  currently  prevented  from 
entering  their  critical  regions,  we  may  wish  to 
Impose  some  priority  on  these  processes  so  that 
when  conflicting  processes  eventually  do  leave 
their  critical  regions,  the  process  having  the 
highest  priority,  rather  than  the  process  that  has 
been  waiting  the  longest,  will  be  the  first  to 
access  its  own  region. 

In  this  paper  we  present  a modification  of 
Lamport's  system  that  corrects  some  drawbacks  of 
both  his  and  Katseff's  solutions.  In  particular: 

1)  We  maintain  the  basic  capabilities  of 
Lamport's  design  but  add  a facility  to  detect  the 
formation  of  anamalous  situations  in  which  a set 
of  processes  will  deadlock  because  each  process 
believes  another  process  has  priority  over  it. 

2)  One  variable  that  is  used  in  Lamport's 
solution  may  grow  unboundedly  large  (though  in 
practice  this  may  have  little  effect).  We  show 
how  to  limit  to  a finite  range  the  possible  values 
of  all  variables  used  for  synchronization 
purposes. 

3)  Lamport's  and  Katseff's  code  requires  that 
each  process  contain  an  array,  the  length  of  which 
is  equal  to  the  total  number  of  processes.  With 
the  recent  advances  in  computer-on-a-chip  hardware 
designs,  it  is  quite  likely  that  future  machine 
architectures  will  involve  huge  numbers  of 
communicating  processors  (capable  of  running  a 
proportionately  large  number  of  processes),  each 
processor  possessing  a fairly  limited  amount  of 
memory.  Such  a hardware  scheme  is  clearly 
incompatible  with  Lamport's  and  Katseff's 
routines.  In  our  program,  each  process  will  need 
to  keep  track  of  only  a constant  number  of  other 
processes . 


SYSTEM  OVERVIEW 


The  architecture  of  the  system  we  will  use 
for  our  studies  is  conceptually  simple:  we  have  a 
set  of  processors,  each  processor  capable  of 
executing  at  most  one  process  from  a set  of  N 
processes,  and  each  processor  communicating  with  a 
subset  of  the  other  processors.  By  "communicate" 
we  mean  that  one  processor  may  read  from  another's 
memory  or  possibly  transmit  an  Interrupt  signal 
(this  latter  condition  is  not  essential); 
however,  conforming  to  the  definition  of  a truly 
distributed  system,  it  may  not  store  into  any 
memory  but  its  own. 

We  further  assume  that  a processor  may  fall, 
though  it  does  so  In  a somewhat  orderly  fashion. 
A read  request  Issued  to  a process  immediately 
after  this  process  has  malfunctioned  may  return 
arbitrary  values.  Eventually  only  some  default 
value  will  be  returned  by  read  requests  to  a 
falling  processor,  hence  it  is  impossible  to 
accurately  examine  the  memory  contents  of  such  a 
processor.  Each  processor  has  the  ability  to 
detect  its  own  deviation  from  normal  operating 


protocol  and  shut  itself  do.-  without  transmitting 
spurious  interrupts  and  without  writing  incorrect 
information  on  a disk  to  which  it  is  linked.  The 
process  that  had  been  running  on  a processor  until 
Chat  processor  malfunctioned  may  be  restarted  at 
some  predefined  point. 

As  noted  in  the  previous  section,  the  early 
solutions  to  the  critical  section  problem  require 
disjoint  processes  to  store  into  common  memory 
locations.  Many  of  the  synchronization  schemes 
that  have  been  proposed  to  date,  such  as  PV 
[EWD68],  monitors  [CARH74],  and  path  expressions 
when  implart-ted  in  terms  of  semaphores  [CH,  AKH], 
seem  to  rely  upon  a dedicated  scheduling  routine. 
Unfortunately,  such  schemes  are  incompatible  with 
the  desired  autonomy  of  processors.  For  if  the 
processor  in  vAiich  global  data  is  stored  or  a 
dedicated  scheduler  should  fail,  the  entire  system 
fails.  Lamport  has  explored  many  aspects  of  a 
synchronization  scheme  that  avoids  this  drawback, 
though  he  only  touches  briefly  upon  the  issue  of 
scheduling.  We  examine  this  last  issue  in  greater 
detail . 

The  synchronization  primitive  used  by  Lamport 
is  an  extension  of  the  conditional  critical  region 
first  proposed  by  Hoare  ICARH71]  and  later 
described  by  Brinch-Hansen  [PBH72,  PBH73a, 
PBH73b].  This  new  primitive  takes  the  form 

region  <mode>  when  <condition> 
do  <critical-section>  od 

The  metavariable  <mode>  is  an  expression 
(typically  a constant  or  a single  variable)  which 
evaluates  to  an  elanent  of  some  arbitrary  finite 
set  M (subject  to  Restriction  #1  below); 
<condition>  is  a Boolean  expression; 
<critical-sectlon>  is  an  arbitrary  length  of  code 
(subject  to  Restriction  F3)  which  comprises  the 
critical  region. 

It  may  not  be  the  case  that  all  critical 
regions  will  conflict  with  all  other  critical 
regions  in  the  sense  that  we  may  desire  two 
processes  to  be  executing  their  critical  regions 
simultaneously,  though  either  or  both  of  these 
processes  may  in  turn  prevent  a third  process  from 
entering  its  region.  To  formalize  this  notion,  we 
define  a symmetric,  time- independent  function 
conflic t : M x M — > (true,  false).  We  then  say 
that  two  processes  conflict  if  and  only  if  they 
are  both  attempting  to  execute  reg 1 on  statements 
with  respective  <mode>  values  of  model  and  mode2, 
and  conflict  (model,  mode2)  * true. 

The  semantics  of  the  region  statement  can  be 
stated  quite  simply:  the  code  in  the 
<crltlcal-section>  may  not  begin  execution  if  a 
conflicting  process  has  already  entered  the 
<crltlcal-sectlon>  of  a region  statement  or  if 
<conditlon>  evaluates  to  false.  To  prevent 
certain  anomalous  situations  from  arising,  we  must 
enforce  the  following  restrictions  on  our 
synchronization  primitive: 

Restriction  #1:  The  value  of  <mode>  must  roaaln 
constant  during  the  entire  execution  of  the 


2 


re&U  n stateoifiit  to  wh  i c h it 


Ik  assoc  1 At  t*d , 


Restriction  * 2 : To  prevent  races  between 
Inst  rue  t ions  which  alter  and  examine  a when 
<condition>,  arguments  of  the  <conditions  of 
one  process  reg  i on  statement  which  are 
stored  lr  the  memory  of  another  process  may 
or.lv  be  .'d if  ted  by  this  second  process 

within  a region  statement  which  conflicts 
with  the  first  region  statement. 


Note  that  it  the  <conditiorv>  of  a region  statement 
does  not  depend  up*  t the  contents  of  another 
process  address  space,  then  this  '.condition''  must 
a I wavs  evaluate  to  true,  lor  If  this  were  not  so, 
then  the  process  would  enter  the  reg i on  statement, 
halt  execution  until  the  '.condition''  became  true, 
thereby  preventing  assignments  to  the  very 
variables  that  can  satisfy  the  <.conditions  and 
causing  the  process  to  deadlock  with  it  sell. 


argument,  so  the  value  it  finally  obtains  for 
must  precede  11,  J)  will  be  incorrect.  To 
overcome  this  difficulty,  Lamport  assinr.es  that 
must  precede  Is  strongly  c onst  ant , meaning  that 
Its  value  will  not  change  when  we  are  in  the  midst 
of  computing  it.  This  convention  simplifies 
matters  greatly  land  In  fact  probably  does  not 
pose  a severe  restriction'),  so  we  will  adopt  It  as 
we  1 1 . 


The  lnt  erpr  et  at  Ion  of  the  must  J£T*£*d* 
function  is  self-evident,  but  It  is  important  to 
point  out  that  it  has  meaning  only  on  those 
processes  that  are  simultaneously  waiting  to  enter 
their  critical  regions  and  that  conflict  with  one 
another.  Putting  together  t he  mechanisms  we  have 
described  so  tar,  it  becomes  clear  that  a process 
1 can  enter  the  <cri  t leal -sect  ion>  of  a reg  ion 
statement  onlv  if  tin*  following  three  conditions 
are  satisfied: 


Restriction  *3:  A region  statement  tnav  not  be 
one  of  the  instructions  in  the 
<cr it icai -sect lou>  of  another  regl on 

st  at  erne  tit . 

One  problem  frequently  associated  with 
conditional  critical  regions  is  the  difficulty 
they  pose  in  expressing  some  svnehroni rat  ion 
problems.  These  problems  usually  have  a 
"scheduling”  flavor  to  them:  given  a set  of 
conflicting  processes  that  are  all  competing  to 
enter  their  respective  critical  regions,  which 
will  take  precedence'*  To  remedy  this  flaw,  we 
define  a new  function  must  precede:  {1,  2,  . ..,  N) 
x U , 2 , . . • , N ) -->  { true,  false}  wh 1 ch  ma v 
depend  upv'n  anv  information  that  is  available  to 
the  svstetr.  Therefore  given  a particular  i and  .1 
in  the  set  4 1 , . . , , N) , inns’  precede  l i , j)  need 
not  remain  constant  over  a period  of  time. 
(Lamport  actually  calls  this  function 

"should  precede";  we  will  save  this  term  to 
denote  a different  function.) 

This  very  general  definition  of  must  precede 
is  actually  too  permissive.  The  following 
argianent  illustrates  this  point.  Suppose  that  in 
addition  to  1 and  j,  the  names  of  the  two 
processes,  the  function  must  precede  depends  on  K 
other  sources  of  information,  e.g.  which 
processes  are  in  their  critical  regions,  which 
processes  are  awaiting  permission  to  enter  their 
regions  and  how  long  they've  been  in  this  state, 
which  processes  have  tailed,  the  values  stored  in 
the  memories  of  various  processes,  etc.  It  Is 
very  unllkelv  that  a process  can  examine  all  K+2 
arguments  and  instantly  determine  the  value  of 
must  precede  (1,  j).  Rather,  the  process  would 

probablv  scan  one  or  two  argianent*  at  a time  and 
combine  this  information  with  previously  computed 
results  to  obtain  a partial  answer.  This 
procedure  would  repeat  this  until  all  arguments 
have  been  examined  and  must  precede  (i,  J)  has 
been  fully  determined.  Consider  the  case  in  which 
a process  is  scanning  the  xth  argianent  of 
must  precede  (x  is  in  the  interval  (2,  ...»  K+2  1) 
when  another  process  alters  the  value  of  the  yth 
argianent  (y  la  in  the  interval  (1,  ...»  x- l jT* 
The  first  process  will  never  rescan  the  yth 


Condition  #1:  All  processes  that  conflict  with 
process  i are  executing  code  outside  of 
their  critical  regions. 

Condition  #2:  The  when  xcondit ionN  evaluates  to 
t rue . 

Condition  #3:  For  all  processes  j that  are 
presently  executing  r eg i on  statements  but 
have  not  vet  entered  their 

icr it ical-sect ions ' s , and  that  conflict  with 
process  i 

true  If  J has  been 

waiting  longer  than  i 

must  precede  l i , J ) - V. 

I false  if  i has  been 
I waiting  longer  than  j 

In  other  words,  of  all  the  processes  that  do 
not  conflict  with  another  process  that  is  in  a 
t.crl t ical-sect  ion>  (*1),  that  have  true  when 
Ccond i t ionN ' s (*2),  and  that  have  no  predecessors 
(in  the  sense  that  there  is  no  conflicting  process 
j for  which  must  precede  (1,  l)  holds  true),  time 
of  arrival  Is  the  final  arbiter  (*3).  We  impose 
one  more  condition  on  our  system  that  guarantees 
that  no  process  can  he  locked  out  of  a 
<cr i t ical -sect ionN  once  it  has  begun  executing  a 
region  statement: 

Condition  #4:  Assuming  no  further  processes 
encounter  region  statements,  a process 
satisfying  Conditions  1 - 3 will  enter  its 
<crit ical-sect ion>  after  a finite  delay. 

This  condition  will  follow  if  w assume  that  all 
processes  make  progress  executing  their 
instructions  (though  our  previous  assumption  of 
asynchronous  operation  may  make  this  progress  very 
•low)  and  if  a permanent  deadlock  situation  does 
not  exist  among  the  processes  that  are  waiting  to 
enter  their  critical  regions. 
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In  the  last  section  we  briefly  mentioned  the 
possibility  of  two  or  more  processes  causing  a 
deadlock  while  waiting  to  enter  critical  regions. 
To  see  how  this  raig't  happen,  consider  the  most 
trivial  case  for  the  moment.  Suppose  that  process 
i has  Just  encountered  the  statement 

reft  i on  model  when  true  do  \anvthitig>  od 

where  conf  1 tc  t (model,  model)  - true  and 

must  precede  v i , i)  • true.  I sing  v'  rules  for 
selecting  processes  to  enter  their 
<cr i t ical-sect ion>  s , process  1 must  wait  tor 
itself  to  leave  its  \r r i t ical-sect ion>  before  it 
can  enter  it,  a clear  impossibility.  A deadlock 
is  present,  and  Condition  is  violated  (unless 

must precede  (i,  i)  changes  to  false  at  some 

future  point).  Although  this  may  seem  like  a 
contrived  example,  therefore  not  a very 

convincing  justiflcati  \ for  our  attempts  to 
determine  the  existence  of  deadlocks,  these 
deadlocks  can  arise  in  far  more  subtle  ways.  The 
following  theorem  characterizes  the  situations  in 
which  a deadlock  will  be  present. 

Cvc  le  Theorem : A deadlock  will  exist  among  the 
processes  that  are  awaiting  entrance  to  their 
critical  regions  if  and  only  If  there  exists 
a subset  fP(d),  Pil),  ...»  P(L)1  of  these 
processes  which  form  a "cycle"  in  the  sense 
that  for  all  1 in  the  set  {0,  1,  ....  1) 

(1)  PCI)  is  ir.  a region  statement  with  <modeN 
value  M(  i ) , and 

(2)  the  functions  must precede  (P(i),  P(i+1  mod 

and  conf 1 let  (M(i),  Mi i+ i mod  L) ) 
evaluate  to  true. 

Proof : The  "if"  part  follows  immediately  from  our 

definitions.  The  "only  if"  part  stems  from  the 
following  fact:  if  we  trace  backwards  over  the 
must  precede  and  c on f 1 i c t relations  on  a finite 
set  of  processes,  we  must  eventually  either  return 
to  a process  which  has  already  been  visited 
(therebv  showing  the  presence  of  a cycle),  or  else 
we  will  arrive  at  a process  1 for  which  there  are 
no  processes  j such  that  must  precede  (J.  i)  • 
true  and  processes  1 and  J are  in  confiding 
region  statements.  In  this  latter  case  there  is 
no  cycle,  but  process  i can  enter  its 

verit ical-sect ionN  and  there  is  no  deadlock. 

We  must  establish  several  ground  rules  for 
manipulating  faultv  processes  so  that  we  will  have 
a common  convention  with  which  to  work.  In 

addition  to  assuming  that  a failing  process  does 
not  behave  "maliciously,"  e.g.  it  sends  off 

spurious  Interrupts  to  the  remaining  operational 
processes,  we  further  assume  that  we  have  some 
reliable  mechanism  for  determining  whether  a 
particular  process  has  failed.  A process  can  be 
thought  of  ss  emitting  a "carrier  signal";  when 
the  signal  dies,  the  process  has  failed. 

Processes  which  fail  while  on  the  queue 

remain  there  until  some  externa]  device  repairs 
them  so  that  they  can  eventually  enter  their 


critical  sections.  We  adopt  this  convention  on 
the  basis  of  its  being  the  most  general  scheme  for 
dealing  with  the  failure  of  enqueued  processes. 
"Most  genera in  the  sense  used  here,  means  the 
ability  of  this  sc  hem**  to  simulate  any  other 

scheme.  This  generality  arises  from  the 
flexibility  of  scheduling  provided  bv  the 
must  precede.  For  example,  wr  could  easily  alter 
the  value  of  must  precede  to  effectively  ignore 
the  presence  of  a failed  process  on  tin*  queue.  Of 
course,  we  are  assuming  that  in  such  a situation 
the  values  of  the  arguments  to  must  precede  can  be 
determined  despite  the  loss  of  access  1 b il 1 1 y to 
data  that  haw  been  stored  bv  malt unct loni ng 

processes . 

Processes  which  fail  while  executing  their 
critical  sections  can  block  many  other  processes 
with  which  thev  conflict,  thereby  causing  serious 
degradation  in  svstem  performance.  We  will  assume 
such  processes  are  to  be  removed  from  their 
critical  sections  by  the  external  mechanism  before 
being  repaired  and  returned  to  normal  operation. 
Note  that  once  in  its  critical  section,  a process 
is  beyond  the  effects  of  the  must  precede 

function.  Thus  vx?  do  not  have  the  run-time 

flexibility  we  had  when  dealing  with  the  failure 
of  enqueued  processes,  and  we  appear  to  be  quite 
rigidly  bound  bv  whatever  scheme  we  choose  for 
servicing  processes  which  fail  in  the  midst  of 
their  critical  code. 

It  would  be  unreasonable  to  assume  that  a 
process  can  be  made  to  stop,  perform  some  desired 
operation,  and  resume  unless  it  is  under  our 
control.  Thus  wa*  cannot  expect  the  cooperation  of 
processes  which  are  executing  their  critical 
sections  or  non-cTilieal  sections.  The  only  times 
a process  does  come  under  our  control  so  that  it 
can  be  made  to  perform  synchroni rat  ion  tasks  is 
when  it  is  waiting  on  the  queue  and  leaving  its 
critical  section. 

. Because  concurrent  computations  are 
inherently  difficult  to  understand  tand  rigorous 
mathematical  proofs  of  their  correctness  are  even 
more  difficult  to  comprehend),  we  will  break  down 
the  development  of  the  algorithm  into  three  steps. 
In  the  first  version,  we  deal  with  a sequential 
program  that  will  temporarily  serve  as  our 
scheduler  and  that  is  easy  to  comprehend.  In  the 
next  version,  we  transform  the  sequential  program 
into  a parallel  program.  At  this  point  we  are 
halfway  to  our  target  program:  control  of 

instruction  sequencing  has  been  removed  from  the 
central  scheduler  and  is  now  managed  by  the 
individual  processes,  but  shared  memory  is  still 
utilized.  In  the  final  version,  we  convert  this 
parallel  program  into  fully  distributed  code  by 
passing  out  the  common  storage  locations  among  the 
component  processes.  (For  notat tonal  convenience, 
we  say  i ••>  J if  conf 1 ict( 1 , j)  * true  and 
muat  precede! l,j)  ■ true."} 

There  are  several  advantages  to  trestle  th e 
development  of  a distributed  program  as  code 
synthesis  beginning  with  a simple  statement  of  the 
solution  rather  than  as  a programming  task 
followed  by  a verification  phaae.  Not  only  are 
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proofs  of  programs  (especially  parallel  programs) 
difficult  to  devise,  they  are  almost  as  difficult 
to  understand  due  to  their  ad  hoc  nature.  Even  if 
the  rules  of  verification  could  he  formalized, 
mechanical  verifiers  invariablv  Buffer  froc. 
extremelv  poor  efficiency,  as  the  task  they  are 
meant  to  perform  is  almost  surely  lntractible. 
Svnthesizing  code  bv  means  of  simple 
t rank  fo  rr.  a cions  need  not  requirr  a major  effort, 
just  a**  thr  compilation  of  high  level  sequential 
Ian*  .es  into  machine  level  code  can  be 
accomplished  efficient  lv  and  In  a straightfoward 
manner  ( presur.abls  because  this  is  a we  1 1 
understood  task*.  Furthermore,  programming 
techniques  demanding  verification  suffri  because 
t is  difficult  t«  build  tact  new  prograr  u:>.  n old 
programs.  Instead,  manv  papers  dealing  with 
parallel  processes  seer  to  heglr.  afresh,  defining 
low  level  features,  expanding  upon  them,  and 
finally  verifying  what  has  beer,  developed.  On  the 
other  hand,  synthesis  begins  with  a small 
collection  of  requisite  parameters,  and  modifies 
these  to  mesh  with  t be  low  level  features  of  the 
system  In  a top-down  fashion. 

Version  1 

In  this  initial  version,  we  are  dealing  with 
a verv  simple  sequential  program.  The  scheduler 
exists  as  a separate  routine  (which  we  will 
presently  assume  is  immune  to  failure),  and 
governs  the  operation  of  all  other  processes.  A 
macroscopic  view  of  the  operation  of  the  scheduler 
is  given  by  the  flowchart  in  Figure  1. 

There  is  one  very  important  issue  that  ye 
have  avoided  so  far;  how  do  we  deal  with  two  or 
more  processes  that  simultaneously  begin  execution 
of  reg ion  statements?  Or  in  terms  of  our  system, 
how  do  we  treat  processes  that  signal  their 

intention  to  interact  with  the  scheduler  when  the 
scheduler  is  already  busy  servicing  some  other 
process?  Before  proceeding  with  our  description 
of  the  algorithm,  we  must  put  this  Issue  to  rest 
by  establishing  a method  for  determining  the 
relative  ordering  of  such  processes. 

Optimally,  we  would  like  the  scheduling 
routine  to  service  processes  in  the  same 
chronological  order  these  processes  signal  the 
scheduler.  One  solution  to  this  problem, 

performed  at  the  implementation  level  of  the 

system,  would  be  to  let  each  process  dispatch  an 
interrupt  when  it  wants  the  attention  of  the 

scheduler.  The  scheduler,  in  turn,  serves  as  our 
interrupt  handler,  and  it  disables  all  other 
interrupts  until  the  process  requesting  attention 
is  fully  serviced.  In  this  solution,  we  have 
pushed  the  problem  back  onto  the  hardware 
mechanism. 

Another  possible  solution  might  be  to  let 
each  process  maintain  a timer  while  It  Is  awaiting 
the  attention  of  the  scheduler.  T^ie  timer  could 
be  a mechanical  clock,  or  we  covild  let  the  program 
idle  In  a loop.  On  each  Iteration  of  the  loop,  a 
variable  TIMER  would  be  incremented  by  one.  When 
the  scheduler  becomes  available,  it  picks  the 
process  whose  timer  indicates  the  longest  wslt. 


This  solution  suffers  severs 
Impending  upon  the  response  t imr  of 
the  v*lue  stored  in  the  timer 
unboundedly  large.  Even  worse, 
with  an  asynchronous  svstem,  so  the 
reflect  a true  measure  of  the  wait! 
if  we  assume  a finite  bound  on  the 
process  relative  to  another,  we 
that  all  processes  will  eventual  1 
attention  of  the  scheduler). 


1 drawbacks, 
the  scheduler, 
could  g:ow 
we  are  deal ing 
timer  may  not 
ng  time  (though 
speed  of  one 
are  guaranteed 
v command  the 


In  both  of  these  solutions,  we  have  relied 
upon  an  external  agent  to  assume  the  burden  of  the 
problem.  Is  It  possible  to  avoid  the  use  of  an 
external  device  entirely?  We  maintain  the  answer 
is  no . In  any  realistic  svster,  there  will  be  a 
los*r  bound  on  the  length  of  time  that  can  be 
measured.  If  tw<.  events  occur  within  this  time 
span,  we  are  faced  with  the  problem  of  taking 
these  seeming lv  simultaneous  events  and 
determining  which  of  their  actually  came  first. 
What  choice  do  we  have,  but  to  reiv  upon  an 
external  arbiter  to  resolve  this  dilemma* 
Hopefully,  6uch  an  arbiter  would  either  he  capable 
of  measuring  time  on  a more  refined  basis,  or 
would  have  some  other  Information,  unknown  to  us, 
for  ordering  events. 


In  our  system,  the  lower  bound  for  measuring 
time  is  the  maximum  response  time  of  t lit* 
scheduler.  What  we  have  done,  in  effect,  is  to 
treat  time  as  a resource,  and  to  insist  that 
mutual  exclusion  be  maintained  on  this  resource  at 
those  points  in  lime  when  a process  is  interacting 
with  the  scheduler.  We  note  in  passing  that  many 
svstems  that  have  been  described  in  the  literature 
finesse  the  Issue  of  simultaneity  by  assuming  the 
availability  of  ind lv islble  or  atomic  operations. 

Version  2 


Continuing  with  the  synthesis  of  our  final 
program,  we  now  "snip"  the  control  mechanism  Co 
eliminate  the  explicit  scheduler.  The  scheduler, 
which  is  still  failure-free,  can  instead  he 
thought  of  as  existing  only  in  an  conceptual  form, 
transmitting  Instructions  to  the  individual 
processes.  By  this  we  mean  that  the  scheduler 
issues  an  instruction  which  all  the  processes 
compete  for  and  execute.  The  execution  of  such  an 
lnstuction  is  finished  when  all  the  processes  have 
completed  their  portions  of  the  code,  or  have 
failed.  The  result  is  a parallel  program  which 
utilizes  shared  memory. 

In  reality,  each  process  will  have  a copy  of 
the  scheduler.  These  individual  copies  will 
operate  in  asynchronous  parallel  manner  by  using 
a "mutual  handshake"  concept.  When  one  component 
of  the  scheduler  finishes  some  instruction,  it 
polls  the  other  components  to  determine  if  they 
have  finished  their  respective  inst ructions , and 
wilts  until  they  htve  done  so  before  proceeding 
with  the  next  instruction.  Setting  a flag  at  the 
beginning  and  end  of  each  lnstuction  would  be  a 
simple  mechanism  for  determining  whether  or  not 
each  process  had  finished  Its  scheduling 
lnstuction.  Figure  2 illustrates  a sample 
Instruction  for  enqueuing  a process  that  begins 
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executing  a rrg  i <>r  *t  atemet.t  . 

We  also  begin  to  decompose  the  queue  at  this 
point.  Instead  of  having  one  process,  the  common 
schedviler,  store  the  configuration  of  the  enqueued 
processes,  we  now  let  each  component  process 
rer.es  her  its  location  within  the  queue.  The 
processes  on  the  Queue  will  be  strva^g  together  in 
a linear  sequence  by  a set  of  multiple  pointers. 
Each  process  contains  s-eleoent  arrays  BEFORE  and 
AFTER.  The  value  of  BEFORE’ i)  is  the  identifier 
of  the  process  which  arrived  on  the  queue  i 
arrivals  before  the  process  in  which  this  arrav  is 
stored.  AFTER  has  l)  % c.mpl  eaent  arv  meaning.  We 
will  sometimes  subscript  a variable  will  ndex  1 
to  emphasize  that  this  variable  is  local  to 
process  1.  Figuri  3 pr  vides  a global  view  of  the 
structure  of  these  arravs. 


The  purpose  of  the  multiple  links  between 
processes  is  twofold.  First,  should  a process 
fail,  we  can  still  determine  which  processes 
follow  it  or  precede  it  on  the  queue  simply  by 
following  an  alternative  link  around  the 
malfunctioning  process.  And  second,  the 
redundar.cv  of  these  pointers  can  be  useful  for 
detecting  the  failure  of  processes.  Many  previous 
solutions  to  the  critical  section  problem  assume 
that  when  a process  fails,  it  turns  on  some  sort 
of  signal  that  beacons  its  failure  to  the 
remaining  functional  processes,  so  that  the 
operation  of  these  processes  will  not  be  affected. 
Clearly  this  is  not  an  entirely  realistic 
asscaapt  ion . We  note  that  if  BEFORE  |j]  • k and 

i 

AFTER  [ i]  does  not  equal  i,  then  it  is  quite 
k 

likely  that  either  or  both  of  rrocesses  t and  k 
have  failed.  Further  tests  involving  ccxaparisons 
with  links  from  other  processes  could  aid  in 
pinpointing  the  exact  Identity  of  the 
malfunctioning  process. 


Ve  r s i on 


In  the  third  and  final  version  of  our 
routines,  we  are  ready  to  eliminate  the  scheduler 
completely  and  to  distribute  both  the  memory  and 
control  mechanism  to  the  individual  processes. 
Each  process  has  a copy  of  the  scheduler  and  can 
be  thought  of  as  issuing  instructions  to  itself. 
The  processes  then  operate  in  conjuctlon  to 
determine  which  instructions  should  be  executed 
and  when. 


We  are  not  quite  finished,  however,  due  to 
the  memory  requirements  that  would  result  from  a 
naive  implementation  of  the  instructions.  A 
restriction  ***  have  placed  on  our  svstem,  along 
with  the  need  for  a distributed  control  mechanise, 
is  that  each  process  use  a limited  amount  of 
memory.  In  other  w>rds,  each  process  should  have 
an  address  space  whose  sir*  is  independent  of  n, 
the  number  of  processes.  Nearly  all  of  the 
instructions  obey  this  property,  the  sole 
exception  being  the  dead lock-t est  operation. 

The  Cycle  Theorem  tells  us  that  testing  for 
deadlock  is  equivalent  to  testing  for  the  presence 
ot  cycles  in  the  •»>  relation.  Phrasing  this 
another  wav,  a deadlock  will  exist  if  and  only  it 
some  process  p obevs  the  relationship  p ••>  p, 
where  JU> 

is  the  (non-reflexive)  transitive 
closure  of  ••>.  A deterministic  algorithm  for 
computing  the  transitive  closure  on  n objects  will 
undoubtedly  proceed  by  following  the  »*N  relation 
from  one  object  to  the  next  and  backt racki ng  where 
necessary.  To  prevent  some  sequence  a •■>  b •*>  c 
mm>>  ...  ••)  2 of  processes  from  being  examined 

repeal  edlv,  it  appears  necessary  to  keep  a record 
of  the  processes  along  such  chains  that  have 
already  been  scanned  and  need  not  be  re-examined. 
The  number  of  markers  needed  to  maintain  this 
record  yields  0(n^  space  complexity  in  the  worst 
case.  Linear  space  complexity  is  unfortunate  fro- 
our  point  of  view,  for  even  though  an  amount  of 
memorv  proport  ional  to  n w'i  1 1 be  needed  to  test 
for  deadlock,  no  single  process  can  directly 

utilize  that  much  space.  Thus  each  of  the  n 

processes  must  devote  a constant  amount  of  memory 
toward  executing  the  dead  1 ock-tes t instruction. 

To  see  if  process  i has  caused  a deadlock, 
process  i turns  a flag  CYCLE  to  ON.  Each  process 
k other  than  i checks  to  see  if  there  is  a process 

j such  that  CYCLE  - ON  and  j -«>  k.  If  so, 

j 

process  k sets  CYCLE  to  ON,  establishing  one  more 

k 

link  in  the  potential  deadlock  cycle.  Eventually, 
either  no  more  processes  car.  set  their  values  of 
CYCLF  to  ON  (in  which  case  there  can  be  no 
deadlock^  and  the  test  ends,  or  some  process  k 
such  that  k ■•>  i sets  CYCLF  to  ON,  and  process  i 

k 

notes  the  completed  cycle  and  announces  the 
presence  of  a deadlock.  This  deadlock  check 
algor ithm  is  outlined  by  the  flowchart  in  Figure 
4 . 


Possibly  the  first  feature  of  version  2 that 
strikes  the  reader  is  that  memory  management  has 
been  almost  entirely  divided  among  the  constituent 
processes.  This  division  of  memory  management  has 
been  one  of  our  prime  objectives  from  the 
beginning,  for  in  order  to  conform  to  the 
definition  of  a distributed  system  and  reap  the 
f aul t-tolerant  capabilities  such  systems  have  to 
offer,  we  must  Insure  that  individual  processes 
perform  write  operations  only  on  their  own  local 
memories.  An  examination  of  the  instris:  tions  of 
version  2 reveals  that  all  of  the  lnstuctlons 
cause  process  i to  alter  only  the  contents  of  its 
own  memory. 


An  analysis  of  the  requirements  for  the 
deadlock-test  instruction  shows  that  it  can  fail 
in  either  of  two  circumstances:  more  than  the 

designated  number  of  consecutive  processes  fail 
simultaneously  (in  which  case  the  remaining 
operational  processes  will  not  be  able  to  assume 
responsibility  for  all  of  their  malfunctioning 
counterparts),  or  all  the  processes  on  the  queue 
fail  simultaneously.  However,  neither  of  these 
conditions  is  too  important.  We  have  ruled  out 
the  first  case  (or  at  least  know  the  probability 
of  its  happening).  In  the  second  case,  there  are 
no  operational  processes  on  the  queue , so  that 
none  could  possibly  enter  their  critical  sections. 
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inconsequent  t a!  . 


deadlock  Is  therefore 


\ote  that  ***  have  been  ver>  liberal  In 
allowing  the  user  to  risk.  potential  deadlock 
situations.  As  a result,  our  deadloc*  detection 
routine  ini  r s a great  deal  of  run-tine  expense  In 
the  forr  cl  process  cross-talx.  Om  possible 
alternative  to  the  scheme  presented  here  is 
soBewbat  Bore  conservative  in  nature.  Instead  of 
permitting  the  possibility  of  deadlock,  at 
c or  pi  1 e- 1 i“>e  and  checkin#  for  its  presence  at 
rur-t  l"!e  , we  dlsall>w  definitions  of  avist  precede 
tiat  vi-ulc  all  a deadlock  to  develo;  when 
certain  C\< t : r..it  i 'ns  < ! processes  are  enqueued. 
T>  is  con*:  lle-t  i*e  check  . s staple:  we  assure  all 
processes  are  on  the  cueue,  and  use  our  deadlock 
tester  to  see  if  a cvcle  is  present.  If  no  cycle 
exists  under  these  c l rcur.stances , no  cycle  can 
ever  exist,  and  the  system  is  guaranteed  to  be 
dead loc» -t ree . Otherwise,  the  user  is  informed 
tft.it  deadlock  nav  develop  in  the  future.  Thus  we 
need  to  test  for  deadlock  onlv  when  must  precede 
cV.anges,  and  m-t  *!n*never  a new  process  enters  the 
queue  . 


F A 1 1 ' V-Al.YSIS 


order  of  requests  for  entering  the  critical 
regions  is  maintained  and  can  be  used  for 
scheduling  purpose' 

3)  All  variables  involved  in  t he  synchrunl za t i on 
process  assume  values  frurr.  a finite  range. 

<*  ) All  processes  need  to  store  only  a small 
amount  of  data  to  maintain  the  synchronization 
scheme,  bv  "small”  we  mean  an  amount  that  is 
Independent  of  of  the  number  ol  processors  in  the 
system. 

5)  The  failure  and  subsequent  restart  of  any 
Individual  process  or  even  a reasonablv  sm.ill 
subset  of  processes  will  not  cause  a widespread 
system  mal funct ion. 

6)  The  creation  of  a cycle  of 

must -precede- re  la t ed  processes  and  the  resulting 
deadlock  can  be  detected,  t houg*  we  do  not  specify 
what  course  of  action  should  be  taken  fron  that 
point  on. 

Most  importantly,  we  have  demonstrated  a 
technique  for  transforming  ar.  easv-tu-understand 
sequential  program  Into  a distributed  program. 
Each  step  of  the  trangf ormat  ion  is  reasonably 
straightforward.  We  have  attempted  to  find 
natural  lines  along  which  to  dec  or. pose  our 
program.  With  a greater  effort,  we  night  hope  to 
formalize  the  transformation  process,  possibly  tv' 
the  point  where  it  could  be  mechanized. 


One  drawback  of  our  system  is  that  under 
extreme  circumstances  the  entire  system  nay  fail. 
Such  a situation  wxuld  arise  if  groups  of 
operational  enqueued  processes  were  separated  bv 
go  rany  failed  processes  that  the  former  could  not 
use  the  information  contained  in  BEFOKF  and  AFTEK 
to  derive  the  relative  ordering  of  the  groups.  If 
each  of  these  arrays  has  s elements,  at  least  2s 
connective  processes  on  t f*e  queue  must  be  down  at 
the  sane  time  for  the  svster  to  collapse.  The 
probability  of  such  a failure  occurring  is  given 
by  t f>e  formula 
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where  p is  the  probability  that  an  Individual 
process  will  be  nonope  rat  ional  at  any  particular 
mtnent  . Bv  making  s as  large  as  we  desire,  this 
probability  becomes  arbitrarily  small. 


CONCLUS IONS 


We  have  demonstrated  a solution  to  the 
critical  section  problem  for  distributed  systems 
that  satisfies  the  stated  design  requirements: 

1)  It  permits  arbitrary  processes  to 
conflict/not  conflct  depending  upon  the  particular 
critical  regions  they  are  attempting  to  enter. 

2)  It  allows  granting  access  to  critical  regions 
based  upon  an  arbitrary  scheduling  function.  The 


Our  results  point  to  several  other  areas  that 
should  be  examined.  For  exanple,  we  have 
described  one  notion  of  deadlock,  when  in  fact 
there  exists  another  rather  obvious  form  of 
deadlock  with  which  we  have  not  dealt.  If  a 
process  is  waiting  on  t hi  queue  for  its  when 
condition  to  turn  true,  but  no  other  conflicting 
process  has  yet  arrived  which  can  alter  this  when 
condition,  then  this  process,  along  with  all 
enqueued  conflicting  processes  which  it 
must-precede , will  sit  idle.  Determining  whether 
a process  will  alter  any  variables  and  thereby 
change  when  conditions  is  recursively  undec id  able , 
so  it  may  not  be  feasible  to  build  a mechanism  to 
accurately  detect  or  correct  this  type  of 
deadlock.  Is  this  an  important  consideration 
among  real  parallel  routines?  If  so,  will 
heuristic  deadlock  testers  suffice  to  make  this  a 
negligible  problem'’ 

Furthermore,  we  have  been  able  to  develop  a 
reasonably  simple  algorithm  by  passing  the  details 
of  scheduling,  in  the  form  of  conf  1 let  and 
must-precede  relations,  to  the  user.  While  this 
gives  the  user  a great  deal  of  flexibility,  this 
flexibility  must  be  accompanied  by  a certain 
measure  of  responsibility.  Is  all  this 
flexibility  necessary?  Or  must  the  user  pay  for 
it  in  terms  of  the  extreme  care  taken  to  program 
scheduling  relations?  And  are  there  techniques  he 
might  employ  developing  these  relations  that  would 
allow  the  synchronization  protocols  to  execute 
with  greater  efficiency? 

Another  area  for  further  6tudv  centers  around 
the  implementation  of  the  queue.  Our 
multiply-linked  list  is  a "st retched-out"  data 
structure,  in  the  sense  that  it  does  not  require  a 
large  set  of  malfunctioning  processes  to  form  a 
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cut  set  and  thereby  cause  the  system  to  tail.  Are 
there  alternative  data  structures  which  require  a 
larger  cut  set  to  separate  and  therefore  present  a 
lower  probability  of  svste^  failure’’  And  exactly 
wt  at  would  be  the  tradeoff  between  tie  improved 
reliability  of  these  structures  and  the  increased 
complexity  and  reduced  efficiency  of  the  code  for 
the  critical  section  problem 

Ac  know  1 edge*  e n t s Prof.  Mark  K.  Brown  provl  ed 
so:it  useful  suggestions  related  to  the  failure 
analysis.  Prof.  Alan  J.  Perils  made  several 
very  insightful  cor.nents  upon  the  nature  and 
Imitations  of  parallel  s ys  t en s . 
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Figure  1:  Version  1 Common  Scheduler 


when  process  j 
begins  execution 
of  region  statement 


determine  the  distance, 
from  process  i to  the 
end  of  the  queue 


BEFORE^  :«  last  s 
elements  on  the  queue 


Figure  2:  Version  2 Instruction  for  Process  i : Enqueue  Process  j. 
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